Collocation or Free Combination? ― Applying Machine Translation Techniques to identify collocations in Japanese
نویسندگان
چکیده
This work presents an initial investigation on how to distinguish collocations from free combinations. The assumption is that , while free combinations can be literally translated, the overall meaning of collocations is different from the sum of the translation of its parts. Based on that, we verify whether a machine translation system can help us perform such distinction. Results show that it improves the precision compared with standard methods of collocation identification through statistical association measures.
منابع مشابه
Mining Japanese Compound Words and Their Pronunciations from Web Pages and Tweets
Mining compound words and their pronunciations is essential for Japanese input method editors (IMEs). We propose to use a chunk-based dependency parser to mine new words, collocations and predicate-argument phrases from largescale Japanese Web pages and tweets. The pronunciations of the compound words are automatically rewritten by a statistical machine translation (SMT) model. Experiments on a...
متن کاملExtracting Bilingual Collocations from Non-Aligned Parallel Corpora
This paper proposes a new method to find correspondences of uninterrupted collocations from Japanese-English bilingual corpora without sentence-to-sentence alignment. Uninterrupted collocations in English such as “once again”, “give up”, or “gross national product” handled as a single word or a compound word in Japanese, can be automatically extracted with corresponding Japanese words using wor...
متن کاملData Mining Meets Collocations Discovery
In this paper we discuss the problem of discovering interesting word sequences in the light of two traditions: sequential pattern mining (from data mining) and collocations discovery (from computational linguistics). Smadja (1993) defines a collocation as “a recurrent combination of words that cooccur more often than chance and that correspond to arbitrary word usages.” The notion of arbitrarin...
متن کاملBook Reviews Syntax-Based Collocation Extraction
Collocation is a common language phenomenon which has attracted the interest of researchers in many subfields of both theoretical and computational linguistics. Although there is no commonly accepted and precise definition of this phenomenon, collocations are generally understood as complex lexical items, often characterized as unpredictable, idiosyncratic, holistic, mutually selective, and so ...
متن کاملCollocation translation based on sentence alignment and parsing
To date, substantial efforts have been devoted to the extraction of collocations from text corpora. However, only a few works deal with the subsequent processing of results in order for these to be successfully integrated into the NLP applications that could benefit from them (e.g., machine translation). This paper presents an accurate method for identifying translation equivalents of collocati...
متن کامل